Hallucination of Speech Recognition Errors With Sequence to Sequence Learning
نویسندگان
چکیده
Prior work in this domain has focused on modeling errors at the phonetic level, while using a lexicon to convert phones words, usually accompanied by an FST Language model. We present novel end-to-end models directly predict hallucinated ASR word sequence outputs, conditioning input as well corresponding phoneme sequence. This improves prior published results for recall of from in-domain system’s transcription unseen data, out-of-domain transcriptions audio unrelated task, additionally exploring in-between scenario when limited characterization data test system is obtainable. To verify extrinsic validity method, we also use our augment training spoken question classifier, finding that they enable robustness real downstream scarce or even zero task-specific was available train-time.
منابع مشابه
Sequence Learning and Speech Recognition
This paper describes application possibilities for statistical methods and particulary hidden Markov models in the domain of sequence learning after treating the required basics. Especially the task of natural language processing is treated by elaborating on speech recognition and speech translation. Furthermore a network intrusion detection system to detect complex and coordinated Internet att...
متن کاملA Comparison of Sequence-to-Sequence Models for Speech Recognition
In this work, we conduct a detailed evaluation of various allneural, end-to-end trained, sequence-to-sequence models applied to the task of speech recognition. Notably, each of these systems directly predicts graphemes in the written domain, without using an external pronunciation lexicon, or a separate language model. We examine several sequence-to-sequence models including connectionist tempo...
متن کاملAn online sequence-to-sequence model for noisy speech recognition
Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative – discriminative models called Sequence-to-Sequence models, that can almost match the accur...
متن کاملSequence to sequence learning for unconstrained scene text recognition
In this work we present a state-of-the-art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network (CNN) architecture followed by a long short term memory model (LSTM). The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very go...
متن کاملSequence to Sequence Learning for Optical Character Recognition
We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution [Graves et al. (2006)] which uses CTC output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach – (a) A recur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2022
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2022.3145313